NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Impacts of Sex Ratio Meiotic Drive on Genome Structure and Function in a Stalk-Eyed Fly

https://doi.org/10.1093/gbe/evad118

Reinhardt, Josephine A; Baker, Richard H; Zimin, Aleksey V; Ladias, Chloe; Paczolt, Kimberly A; Werren, John H; Hayashi, Cheryl Y; Wilkinson, Gerald S (July 2023, Genome Biology and Evolution)
Zufall, Rebecca (Ed.)
Abstract Stalk-eyed flies in the genus Teleopsis carry selfish genetic elements that induce sex ratio (SR) meiotic drive and impact the fitness of male and female carriers. Here, we assemble and describe a chromosome-level genome assembly of the stalk-eyed fly, Teleopsis dalmanni, to elucidate patterns of divergence associated with SR. The genome contains tens of thousands of transposable element (TE) insertions and hundreds of transcriptionally and insertionally active TE families. By resequencing pools of SR and ST males using short and long reads, we find widespread differentiation and divergence between XSR and XST associated with multiple nested inversions involving most of the SR haplotype. Examination of genomic coverage and gene expression data revealed seven X-linked genes with elevated expression and coverage in SR males. The most extreme and likely drive candidate involves an XSR-specific expansion of an array of partial copies of JASPer, a gene necessary for maintenance of euchromatin and associated with regulation of TE expression. In addition, we find evidence for rapid protein evolution between XSR and XST for testis expressed and novel genes, that is, either recent duplicates or lacking a Dipteran ortholog, including an X-linked duplicate of maelstrom, which is also involved in TE silencing. Overall, the evidence suggests that this ancient XSR polymorphism has had a variety of impacts on repetitive DNA and its regulation in this species.
more » « less
Full Text Available
The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

https://doi.org/10.1093/g3journal/jkac321

Chao, Kuan-Hao; Zimin, Aleksey V.; Pertea, Mihaela; Salzberg, Steven L.; Emerson, ed., J. J. (January 2023, G3: Genes, Genomes, Genetics)

Abstract We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
more » « less
A genome sequence for the threatened whitebark pine

https://doi.org/10.1093/g3journal/jkae061

Neale, David B.; Zimin, Aleksey V.; Meltzer, Amy; Bhattarai, Akriti; Amee, Maurice; Figueroa Corona, Laura; Allen, Brian J.; Puiu, Daniela; Wright, Jessica; De La Torre, Amanda R.; et al (March 2024, G3: Genes, Genomes, Genetics)

Abstract Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.
more » « less
The SAMBA tool uses long reads to improve the contiguity of genome assemblies

https://doi.org/10.1371/journal.pcbi.1009860

Zimin, Aleksey V.; Salzberg, Steven L. (February 2022, PLOS Computational Biology)
Shao, Mingfu (Ed.)
Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca .
more » « less
Full Text Available
The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies

https://doi.org/10.1371/journal.pcbi.1007981

Zimin, Aleksey V.; Salzberg, Steven L. (June 2020, PLOS Computational Biology)

Full Text Available
Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies

https://doi.org/10.1534/genetics.120.303501

Alonge, Michael; Shumate, Alaina; Puiu, Daniela; Zimin, Aleksey; Salzberg, Steven L. (October 2020, Genetics)

Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.
more » « less
Full Text Available
Transcriptome assembly from long-read RNA-seq alignments with StringTie2

https://doi.org/10.1186/s13059-019-1910-1

Kovaka, Sam; Zimin, Aleksey V.; Pertea, Geo M.; Razaghi, Roham; Salzberg, Steven L.; Pertea, Mihaela (December 2019, Genome Biology)

Full Text Available
The genome of the American groundhog, Marmota monax

https://doi.org/10.12688/f1000research.25970.1

Puiu, Daniela; Zimin, Aleksey; Shumate, Alaina; Ge, Yuchen; Qiu, Jiabin; Bhaskaran, Manoj; Salzberg, Steven L. (January 2020, F1000Research)

We sequenced the genome of the North American groundhog, Marmota monax , also known as the woodchuck. Our sequencing strategy included a combination of short, high-quality Illumina reads plus long reads generated by both Pacific Biosciences and Oxford Nanopore instruments. Assembly of the combined data produced a genome of 2.74 Gbp in total length, with an N50 contig size of 1,094,236 bp. To annotate the genome, we mapped the genes from another M. monax genome and from the closely related Alpine marmot, Marmota marmota , onto our assembly, resulting in 20,559 annotated protein-coding genes and 28,135 transcripts. The genome assembly and annotation are available in GenBank under BioProject PRJNA587092 .
more » « less
Full Text Available
Human contamination in bacterial genomes has created thousands of spurious proteins

https://doi.org/10.1101/gr.245373.118

Breitwieser, Florian P; Pertea, Mihaela; Zimin, Aleksey; Salzberg, Steven L (January 2019, Genome Research)

Full Text Available
Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing

https://doi.org/10.1371/journal.pgen.1008571

Read, Andrew C.; Moscou, Matthew J.; Zimin, Aleksey V.; Pertea, Geo; Meyer, Rachel S.; Purugganan, Michael D.; Leach, Jan E.; Triplett, Lindsay R.; Salzberg, Steven L.; Bogdanove, Adam J. (January 2020, PLOS Genetics)
Coaker, Gitta (Ed.)
Full Text Available

« Prev Next »

Search for: All records